Ranking Hotel Listings on Expedia

May 2022 ~ MSc Course "Data Mining Techniques"

Length:   1mo (at 0.25 FTE)

Programming language:   Python (Pandas, time, datetime, scikit-learn, LightGBM)

Data:   Roughly five million search queries for a hotel, containing the properties of the retrieved estates, details about the user, and whether the visitor clicked on the hotel and booked it

Problem description:
Rank hotel listings based on their likelihood of being booked

Approach:
First, the data was explored, investigating its shape and then the types, missing values, and distributions of its features. Secondly, new variables were derived, the missing entries filled, and the categorical features one-hot-encoded. Next, the data was split into the train (90%) and validation (10%) sets. Lastly, two LightGBM models were designed, a regressor (pointwise approach) and a ranker (listwise approach).

Results:
After testing different target variables and encoding techniques for the categorical features, multiple combinations of hyperparameters were compared using a 4-fold Cross-Validation method. The NDCG@5 of the regressor and ranker estimators were evaluated at 0.350 and 0.375, respectively, on the validation data. Next, the models with the best hyperparameters were trained on all the data (including the held-out set) and evaluated on the competition test set at 0.333 and 0.366, respectively, suggesting a slight overfit on the validation data. The resulted NDCGs@5 were interpreted in relation to the random baseline ranking (0.156). Accordingly, the outcomes indicate that the developed models are over two times as good as listing properties randomly. In other words, Expedia would satisfy their customers twice as much by using one of the two models instead of randomly displaying listings.

Finally, in the picture below, one can see the feature importance of the final LGBMRanker model. The most important variable is by far the property id, implying that specific properties are significantly more likely to be booked than others when retrieved after the search.

Average sales per product

  • Address

    Amsterdam, the Netherlands